Conversation
[don't merge yet] Fix package names and travis config
Update instructions with Cloud Object Storage
* arch diag * arch diag * Update README.md * adding specs * adding specs
* update none VM_TYPE * polish export commands
* Extent make test-submit waiting time
…BM#22) * assign default edit role to lcm * add helm value options for 1.7 and below
* adding prereqs and bumping user guide in front * adding prereqs and bumping user guide in front
…ors) (IBM#24) * add caffe2 and pytorch cpu support * update LCM, learner config file, and example jobs * fix pytorch example bug * Update gpu-guide.md * Update gpu-guide.md * merge CPU and GPU examples into a single example * add more tf framework versions * fix typo * add S3 prereq
* update UI instructions * fix command
* adding contributors * Update README.md
* Updating maintainers file * Update MAINTAINERS.md
* add converting script * update converter readme and update tensorflow version * update troubleshooting * Update README.md * Update gpu-guide.md * Update README.md * Update README.md * Update README.md
* Adding references to Watson Studio * Update README.md * Rename README.md to ffdl-wml.md * Update README.md * Create train-deploy-wml.md * Update train-deploy-wml.md * Update README.md * Update ffdl-wml.md * Update ffdl-wml.md * Update ffdl-wml.md * Update README.md * Update README.md * update WML instructions * revert tf example * update caffe manifest
* Update feature-gates for k8s 1.9.4 and above * Update troubleshooting * Update README.md
…ild (IBM#51) * * Add codebase configuration for device plugin and custom learner images * Add developer guide for those who want to do a custom FfDL build * update developer-guide * fix declare type
Plus minor fixes
* Creating CLA * Update CLA.md * Update CONTRIBUTING.md
* add detailed H2O instructions * add detailed H2O instructions
* h20 arch image * h20 arch image * h20 arch image
* upload chinese readme file . upload chinese readme file . * add chinese readme file hyperlinks on readme file. add chinese readme file hyperlinks on readme file. * add chinese readme file hyperlinks on readme file. add chinese readme file hyperlinks on readme file. * modify modify
Pre-0.1 release: Add Object Storage mount and other enhancements.
* Architecture Details * Architecture Details * Architecture Details
* init commit for horovod patch * update examples and docs * update docs and converter script * update example readme * update example readme * modify horovod examples with real workload * modify horovod examples * update sed syntax to be more visual friendly * add troubleshooting for dind cluster * remove deprecated instructions
* horovod * horovod * horovod * horovod * horovod * horovod * horovod
* Remove 4 minute timeout for log follow process (IBM#106) The process that follows the training logs of an ongoing training job should not timeout after 4 minutes. Instead the log follow process should complete after the training job itself is finished. This behavior is necessary to enable chaining up commands to create machine learning pipelines, where subsequent commands require the output data of the training job whose logs are being "followed" like in our ART notebook. This commit reinstates the log follow behavior prior merge of PR IBM#79 * Updates suggested by sboagibm Intention was to not rely on a long term stream being held open, but to be able to re-open a new stream starting from where the old left off, if the connection terminates.
* Update ART Notebook after PR IBM#79 - Load cluster configuration from environment variables - Require PUBLIC_IP and KUBECONFIG instead of CLUSTER_NAME and VM_TYPE - Use storage type "mount_cos" (s3fs) instead of "s3_datastore" * Update ART demo notebook after PR IBM#79 - Load cluster configuration from environment variables - Require PUBLIC_IP and KUBECONFIG instead of CLUSTER_NAME and VM_TYPE - Use storage type "mount_cos" (s3fs) instead of "s3_datastore"
* update dl framework versions * update examples with new framework tags
* update fashion mnist example with seldon 0.2 * fix readme
* Pointed travis testing to do hostmount minikube * Debugging permissions error. * Fix to mkdir problems. * Fixed Makefile syntax. * Printing debugging information about pods. * Printing debugging information about pods. * Printing debugging information about pods. * Printing debugging information incl kubectl get pod. * Enabled debug mode. * Again. * Set debug as default. * tracing from the trainer to lcm * more debugging * added lower level logging * dist: xenial * Update .travis.yml * fix typo * Trying to fix Travis issue. * Fixed Travis issue. * Followed Tommy's request and increased resource limits to values from before. Might break CI. * Parameterized memory values like Tommy requested. * Attempt to fix CI. * Removed excessive debug statements and cleaned comments. Probably breaks code. * DLaaS pull june 14, with security mods * fixed glide problem * Added Image.go etc. files, deleted learner_test.go * temporarily disable framework validation * FIXME: Disable validation check for bucket until conditionalize for s3fs vs. option. * fixed two bugs related to volume mounting * I think mostly just logging changes * basic success * Add FfDL.iml to .gitignore * removed docker ref to csf_env.properties * Test for mount_cos before attempting s3 validation * fixed hostmount by pre-setup of model code in Makefile * fixed missing import * log HELM_DEPLOY_DIR, add a bunch of logging for the ci test * Added create-volumes to jenkins file, more verbose docker build for ui * Wound back Angular to 6.0.8 * Quiet docker-build-ui docker build * merged bin/create_static_volumes_config2.sh into bin/create_static_volumes_config.sh
* update prebuild image version, update helm chart to 0.1.1 * fix make deploy bug
…ce (IBM#110) * make helm charts and scripts compatible to deploy FfDL on any namespace * allow users to export all the enviornment variables in a txt file * Update readme with new notice * Fix typo * Update static volumes config v2 namespace parameter * capitalize NAMESPACE, update Makefile, developer guide, and trobleshooting.
LGTM. Ran fine / fixed statsd issue on Ubuntu 18.04 Vagrant VM.
* Simplifying README * Simplifying README * Create detailed-installation-instructions.md * Update README.md * Update detailed-installation-instructions.md * Update and rename detailed-installation-instructions.md to detailed-installation-guide.md * Update detailed-installation-guide.md * Update detailed-installation-guide.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md
This code is adding ppc64le and multiarch support for FfDL.
- Adds new make target to the main Makefile: **docker-create-manifest**
that creates the multiarch manifests for the services.
- arch (architecture) parameter was added to values.yaml and
storage-plugin/values.yaml
**make docker-build** will generate all the services with a -${ARCHITECTURE}
suffix. During build an ARCH argument is sent to the Dockerfiles in order
to have architecture specific implementation inside it.
Building docker images should look like this:
make docker-build
make docker-push
make docker-create-manifest
|
Hi All, This code adds support for ppc64le in FfDL. It does adds a suffix to the built services with the architecture and also adds a target in the main Makefile to create/ammend the needed manifest lists to support the multiple architectures. There are two services that are needed by the code prom/pushgateway and localstack/localstack (optional) that does not have a ppc64le version in the public registries. I did generate custom build ppc64le version and pushed it to my own registry smonov. They will be pulled from there for now until published on a more official registry. |
values.yaml
Outdated
| expose_node_port: true | ||
| docker: | ||
| registry: docker.io | ||
| registry: ffdl.ibm.com |
There was a problem hiding this comment.
This changes our official Docker image location.
There was a problem hiding this comment.
Sorry my mistake. This was not supposed to be there. Will fix it.
|
Found few issues. Fixing and testing them now:
|
|
Thanks @sdmonov. What is the target for travis if any? |
Could we not put the multi-arch storage plugin in a private repo and use it? The public travis ci is now multi-arch so you should do builds with it. I would like to see the changes in Travis.yaml file. We have to deal with learner images... is that addressed in a different PR? |
- added docker-tag-local target in Makefile - fixed few issues in docker-create-manifest target in Makefile - small fix in docker-push to not have duplicate code
|
This code is adding ppc64le and multiarch support for FfDL.
Adds new make target to the main Makefile: docker-create-manifest
that creates the multiarch manifests for the services.
arch (architecture) parameter was added to values.yaml and
storage-plugin/values.yaml
make docker-build will generate all the services with a -${ARCHITECTURE}
suffix. During build an ARCH argument is sent to the Dockerfiles in order
to have architecture specific implementation inside it.
Building docker images for ppc64le should look like this:
Fixes #115
Developer's Certificate of Origin 1.1